CDP workflow


This report is meant to run a data quality check through out the CDP questions that are related to **Waste data**.

This is the first part of the whole process that ends having the dataset uploaded in the Data Warehouse and with it, available to use by tools like Qlik for data consumption like dashboards or reports.

How to read the Quality Checks report?



Number of cities that responded per question


Number of cities that responded per indicator
Indicator Responses
Amount Percentage
Percentage of the diverted solid waste generated that is recycled (%) 59 75.6%
Percentage of the diverted solid waste generated that is reused (%) 23 29.5%
Percentage of the total solid waste generated that is diverted away from landfill and incineration (%) 61 78.2%
Percentage of the total solid waste generated that is utilized for waste to energy (%) 48 61.5%
Percentage of waste collected where separation at source is taking place (%) 40 51.3%
Percentage of wastewater safely treated to at least secondary level (%) 44 56.4%
Total amount of solid waste generated (tonnes/year) 76 97.4%
Total annual amount of food waste produced in the jurisdiction (tonnes/year) 52 66.7%
Volume of wastewater produced within the jurisdiction boundary (megalitres/year) 51 65.4%
Data Quality Report
QA checks - Check columns structure

tibble waste cdp questionsWARN 0.00 STOP NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT

15
col_is_numeric

Check that columns is numeric

col_is_numeric()

response_orig

1 0
0.00
1
1.00


16
col_is_numeric

Check that columns is numeric

col_is_numeric()

response_suggested

1 0
0.00
1
1.00


17
col_is_integer

Check that column in integer

col_is_integer()

year_orig

1 0
0.00
1
1.00


18
col_is_integer

Check that column in integer

col_is_integer()

year_suggested

1 0
0.00
1
1.00

2024-07-01 17:20:46 -03 < 1 s 2024-07-01 17:20:46 -03
Data Quality Report
QA checks - Check question value’s quality

tibble Waste cdp questionsWARN 0.00 STOP NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT

2
col_vals_between

Check that year col is between 2000 and current 2024

col_vals_between()

year_suggested

[2,000, 2,024]

488 487
0.99
1
0.01


4
col_vals_regex

Check that values containg numbers only

col_vals_regex()

response_suggested

^[0-9]

488 487
0.99
1
0.01


6
col_vals_regex

Check that values containg numbers only

col_vals_regex()

year_suggested

^[0-9]

488 487
0.99
1
0.01


7
col_vals_equal

Check comments where the word wrong or probably/likely wrong is mentioned

col_vals_equal()

response_wrong

0

488 468
0.96
20
0.04


8
col_vals_equal

Check comments where the word wrong or probably/likely wrong is mentioned

col_vals_equal()

response_prob_likely_wrong

0

488 481
0.99
7
0.01


9
col_vals_between

Check that percentage indicator's values are between 0 and 100

col_vals_between()

response_orig

[0, 100]

302 289
0.96
13
0.04


10
col_vals_between

Check that percentage indicator's values are between 0 and 100

col_vals_between()

response_suggested

[0, 100]

302 298
0.99
4
0.01

2024-07-01 17:20:46 -03 < 1 s 2024-07-01 17:20:46 -03



Check columns thar seem to be merged (they appear as lists in the dataframe)



Check values as numbers, clean them and check if any value got lost